Nonparametric estimation of pair-copula 
constructions with the empirical pair-copula 

I. HOB^K HAFF* J. SEGERS 1 

January 24, 2012 

O 

<N 

C . 

, Abstract 

A pair-copula construction is a decomposition of a multivariate copula 
into a structured system, called regular vine, of bivariate copulae or pair- 
copulae. The standard practice is to model these pair-copulae parametri- 
cally, which comes at the cost of a large model risk, with errors propagat- 
| ing throughout the vine structure. The empirical pair-copula proposed in 

the paper provides a nonparametric alternative still achieving the parametric 
convergence rate. It can be used as a basis for inference on dependence mea- 
sures, for selecting and pruning the vine structure, and for hypothesis tests 
£^ | concerning the form of the pair-copulae. 

Key words: pair-copula, regular vine, empirical copula, resampling, Spear- 
man rank correlation, model selection, independence, smoothing 

>: 

m 

^ 1 1 Introduction 



Pair-copula constructions, introduced in 



Jod (119 96) and developed in Bed ford and Cooke 



d200lLl2002h and lKurowicka and CookeT l2006). provide a flexible, but manageable 



in 
o 

CN ■ way of modelling the dependence within a random vector. The crucial model as- 

sumption is that the copulae of certain bivariate conditional distributions do not 
depend on the value of the conditioning variable or vector. In this way, a copula 
in dimension d is completely determined by the collection of pairwise connections 
between conditional distributions for which the model assumption holds, called 
the vine structure of the copula, together with a set of d(d - l)/2 bivariate copulae, 
called pair-copulae. These are grouped into levels according to the number of con- 
ditioning variables of the corresponding conditional distributions, going from the 
ground level, comprising d - 1 pair-copulae which are just bivariate margins of the 
parent copula, up to the top level, consisting of the single copula being the copula 
of the remaining two variables, conditionally on the d - 2 others. 
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Current practice is to model the pair-copulae parametrically, estimating the pa- 
ramet ers with a compo site or pseudo-lik e lihoo d method, that is eith er frequentistic, 
as inlAas et al. ( 2009h and Hobask Hafi ( 2012 ). or Bayesian, as in Min and Czadol 
(2010, 1201 lh . Fitting a pair-copula construction therefore requires the selection of 
d(d - l)/2 copula models. The recursive dependence of inference concerning cop- 
ulae at a certain level on the copulae fitted in the lower levels augments the model 
risk. Thus, bad model choices propagate errors throughout the vine structure. 

In this paper, a nonparametric pair-copula estimator is proposed instead. Of 
course, if the parametric model is correctly specified, a parametric estimator will 
be more efficient. But the nonparametric method is more robust, as it does not 
rely on a parametri c specification. The estimator is based on an idea similar to the 
empirical copula ( Riischendorl . 19761 : Deheuvelsl 1979h . and is therefore called 
the empirical pair-copula. Although it joins conditional distributions, the empirical 
pair-copula still achieves the parametric rate, regardless of the number of condi- 
tioning variables, thanks to the model assumption that these copulae do not depend 
on the conditioning variable. 

The empirical pair-copula yields nonparametric estimators of dependence mea- 
sures such as conditional Spearman rank correlations. These estimates can safely 
be used in vine structure sele ction algorithms , yield ing a nonparametric alternative 
to the procedure proposed in lDissmann et al.l (1201 ll) . Other applications of the em- 
pirical pair-copula concern testing for conditional independen ce at certain levels , 
aiming at pruning or truncating of the vine structure as in lBrechmann et all (120121) . 
as well as goodness-of-fit testing in combination with parametric methods. The 
new method is supported by extensive simulations and is illustrated by case studies 
involving financial and precipitation data. 



2 Pair-copula constructions 

First, let F be the bivariate continuous distribution function of a random pair 
(X\,X 2 ), with margins F-[ and F2 and copula C, that is, 

F(xi,x 2 ) = C{Fi(xi),F 2 (x 2 )}. 

The bivariate density / of F then satisfies 

f(x u x 2 ) = c{Fi(xi),F 2 (x 2 )}fi(xi)f 2 (x 2 ), 



where f\ and f 2 denote the marginal density functions and c is the copula density, 
and the conditional density of X\, given X 2 - x 2 , is 



(1) 



f{x\ , x 2 ) 

fi\2(xi\x 2 ) = = c(Fi(jci),F 2 U2))/i(xi). 

fi(x 2 ) 
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The corresponding conditional distribution function satisfies 

Fi\ 2 {xi\x 2 ) = f 1 fi\2(z\x 2 )dz= f 1 c{F l (z),F 2 {x 2 )}f l {z)dz 

J— CO \J — CO 



c{u, F 2 (x 2 )} du = - — C(u\,u 2 ) 
Jo ou 2 

(2) = C [2] {F 1 (x 1 ),F 2 (x 2 )}. 



r 

Jo 



{u l ,u 1 )=(Fi(x l ),F2(x 2 )) 



Next, let / be the <i-variate probability density function of the random vector 
{X\ , Xj) with d > 3. Let i and j be distinct elements of { 1, . . . , d} and let v be 
a non-empty subset of {1, . . . ,d] \ {i, j}. Write X v = (X, : i e v) and similarly for 
x v . Applying (Q} to the conditional density fij\ v (-,-\x v ) of the pair (Xi,Xj), given 
X v - x v , associated with the copula C i; | v (-, -\x v ) and its density Cjj\ v (-, -|x v ),yields 

(3) fi\juv(xi\xj, x v ) = Cij\ v {Fi\ v (xi\x v ), F j\ v {xj\x v )\x v } fj\ v (Xi\x v ). 
From © it follows that 

(4) Fi\ jUv (xi\xj, x v ) = cfjl v {Fi\ v (xi\x v ), F jlv (x ; ix v )|x v }. 

Equation © provides a way to write fi\j Uv in terms of c/y| v and ^| v , with one variable 
less in the conditioning set. Applying this equation recursively to the terms on the 
right-hand side of the identity 

/(xl,...,Xrf) = fl(Xl) f 2 \l(x 2 \xi) ■■■ fd\\ 2 ..(d-l)(Xd\X\,.,.,Xd-{) 

yields expressions of the form 

d d-\ 

(5) /(xi , . . . , Xd) - Y\ A( x k) Cij\ v {Fi\ v {xi\x v ), Fj\ v {xj\x v )\x v }. 

k=\ e=i a,j,v) 

The number of terms in the third product is equal to d - 1. For each triple (i, j, v) in 
the product, v is a subset of {1, . . . , d}\{i, j} with exactly 1-1 elements. The precise 
list of combinatorial rules tha t the system of triples (/, /', v) mus t obey makes them 
constitute a regular vine as in Bedf ord and Cookd d200lL 2002). Examples of two 
such structures in dimension five are given in Figure [TJ 

Assume that for a specific choice of (i, j, v), the copula density c i; | v does not de- 
pend on the value of the conditioning argument x v , that is, Cij\ v (ui, uj\x v ) is constant 
in x v . Since the corresponding copula C, ; | v (-, -|x v ) is equal to the joint distribution 
function of (F ; | V (X ; |X V ), Fj\ v (Xj\X v )) given X v = x v , we find that the random pair 
(Fi\ v (Xj\X v ), Fj\ v (Xj\X v )) must be independent of the random vector X v . Obviously, 
the converse must hold as well. In that case, equations ([3]> and (|4]) simplify to 

(6) fi\pv(xi\xj, x v ) = Cij\ v {Fi\ v (xi\x v ), F jlv (xj\x v )} /i 1v (x,-|x v ), 

(7) F i\ju v (Xi\Xj, x v ) = C^ ] v {F, lv (x,-|x v ), F jlv (xj\x v )}. 
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If it is true for all t riples (z, /', v) in the regular vine in Q , we arrive at the pair-copula 



construction (Joe, 19961 : Kurowicka and Cooke , 2006) 



d d-l 

(8) f(xu- ■ ■ ,Xd) = Y\ fkixk) Y\ {~[ Cij\ v {Fi\ v (Xi\x v ), F j\ v (xj\x v )}, 

k=l e=l (i,j,v) 

that provides a decomposition of a <i-variate density in terms of d univariate and 
d(d - l)/2 bivariate copula densities. The pair-copula construction corresponding 
to the drawable vine in the left panel of Figure [T]is 

c{Fi (x\),..., F 5 (x 5 )} 

= cn{Fi(xi),F 2 (x 2 )} c 23 {F 2 (x 2 ),F 3 (x 3 )} C34{F3(x3), J F 4 (x 4 )} c 45 {F 4 (x4),F 5 (x 5 )} 

Cn\2{Fl\2(x l \x 2 ),F 3l2 (X3\X2)} C 2 4\ 3 {F 2 \ 3 (X 2 \X 3 ), F 413 (X 4 \X 3 )} C 35 \ 4 {F 3 \ 4 (x 3 \x 4 ),F 5l4 (x 5 \x 4 )} 

Cl4|23 {Fl\ 23 (Xi\x 2 , X 3 ), F 4 \ 23 {X 4 \X 2 , X 3 )} C25|34{^2|34feU3, X 4 ), F 5 |3 4 (x 5 |x 3 , X 4 )} 

Cl5|234{^l|234(*ll*2, X 3 , X 4 ), ^234 (*5 1*2, X 3 , X 4 )}. 

The assumption that the pair-copulae do not depend on the value of the condi- 
tioning argument is a nonparametric shape constraint, t hat is satisfied for instanc e 



by the multivariate Student's t and Clayton copulae (Hobas k Haff et all |2010). 
Even if the assumption does not hold in general, it still provides a reasonable ap- 
proximation to the true distribution in many cases. 

3 Empirical pair-copula 

3.1 Estimator 

Let X t = (Xi t , . . . , Xjt), for f = 1, . . . ,n, be a J-variate random sample from a dis- 
tribution function F with density /, admitting a pair-copula construction © with 
a known regular vine structure. Choice of the vine structure is a difficult problem, 
which we will address in Section l4~2l Consider the ground level normalized ranks 

1 " 

Uitn = 7 X I(X is < X b ) (i = 1 d; t = 1, . . . , n). 

The ground level empirical pair-copula is simply the classical empirical copula 
1 " 

Cij,n(Ui,Uj) = - ^ I {Uitn < Hi, Ujtn < Uj) 6 {l,...,d}; i + J). 

n r=l 

Use finite differencing to obtain an estimator of the conditional distribution func- 
tion: writing C,y,„(A) - n~ l £" =1 /{(£/ iW , Up n ) e A} for AcR 2 and given a band- 
width h > 0, first put 

a[2] _ A/,*([0, "/] x[uj- h, uj + h]) 

lJ '" Ci jtn ([0,l]x[uj-h,uj + h]) 

Hl =l I{Uisn<Ui, \Ujsn-Uj\<h) 

Z n s=l I(\U js „-Uj\<h) 
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and then, following ©, 

(10) F Un (X it \X jt ) = C™(U itn ,U jtn ) 

ir s=l I(fiisn<Uitn,\Ujsn-Ujtn\<h) 
Z" = l K\&j*n ~ U jtn \ < h) 

The denominator of (flOl is approximately equal to 2nh, except at the borders, 
where it is smaller, providing a boundary correction. As the smoothing step in © 
takes place on a uniform (0, 1) scale, the choice of bandwidth h does not depend 
on the marginal distributions; in fact, (flOl is a kind of nearest-neighbour estimator. 
Bandwidth selection will be addressed in Section |3~7T1 where it will be seen that a 
slight degree of undersmoothing is advizable. 

For higher levels, we proceed recursively, exploiting the assumption that the 
pair-copulae do not depend on the value of the conditioning argument. The un- 
winding of the recursion depends on the given vine structure. Let (/, j, v) be a triple 
in the vine decomposition ©; in particular, v is a subset of {1, . . . , d] and i and j are 
distinct elements of {1, . . . ,d] \ v. Suppose that the estimators F^ n (Xk r \X vr ) have 
been defined for k e [i, j} and r = 1, . . . ,n; here X vr denotes the random vector 
(X mr : m € v). The normalized ranks of the estimated conditional probabilities are 

1 n 

(11) Uk\ v ,tn = ~ / . I{F k\v,n(Xkr\Xvr) ^ ^fe|v,re(^fal^vt)} 

r—\ 

(ke{i,j}; t= l,...,n). 

The empirical pair-copula is then defined by 

1 " 

(12) €ij\ V ,n(Ui,Uj) = - ^ l{Ui\v,m <U U U j\ VtSn <Uj). 

Again, apply finite differencing to get hold on the conditional distributions: first, 

Am , x C ( 7|v,n([0, ut] x[uj-h,uj + h]) 

(13) C.J (ui,U;) = — 

1 C iJViB ([0,l]x[« r / I ,« i + /!]) 

E"=l K&iKm < Ui, \G Mm - Uj\ < h) 



i:Ul{\Uj\v, S n-Uj\<h) 



and then, following ©, 



Fi\vUj,n(Xi t \X V [jjj) — C\){Ui\ v j n ,tj j\v,tn) 



(14) 



Z"=i I{Ui\v,sn ^ Ui\v,tn, \U j\ v , S n ~ U j\ vM \ < K) 
Ill =l I{\Uj\v,sn-Uj Wn \<h) ' 



We proceed this way, recursively from the ground level, I — 1, where v is the empty 
set, to the top level, i — d— 1, with v consisting of d-2 elements, adding a variable 
for each level. 
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The empirical pair-copula estimates the pair-copula distribution functions. KolbjOrnsen and Stien 



(2008) propose a nonparametric estimator for the pair-copula density, with vari- 
ables transformed to the Gaussian rather than the uniform domain, to mitigate 
boundary effects. 



3.2 Asymptotic distribution 

Let (i, j, v) be a triple in the vine decomposition ®. Because of the assumption 
that the copula of the conditional distribution of (Xj,Xj) given X v = x v does not 
depend on the value of x v , the pair-copula C !; | v is in fact equal to the unconditional 
distribution function of the random pair (Fj\ v (Xj\X v ), F j\ v (X j\X v )): 

pr{F ilv (X,|X v ) < u u F jW (Xj\X v ) < uj} 

= J ' pr{F ilv (Xi\X v ) < u u Fj\ v (Xj\X v ) < uj \ X v = x v }f v {x v )dx v 

- J Cij\ v (ui,Uj\x v )f v (x v )dx v 

= J Cij\ v {ui, u j) f v (x v ) dx v = Cipiui, u j). 

Therefore, it is reasonable to expect that it can be estimated at the parametric rate 
O p (n- 1 ' 2 ). 

Define the random variables 



(15) U kW = F klv (X kt \X vt ) (k e {/, ;}; t = 1 n). 

We conjecture that under suitable smoothness conditions on the copula density c 
and growth conditions on the bandwidth sequence h = h n , the empirical pair-copula 
(fT2l satisfies 

Cjj\ Vi n(ui,iij) - n ll2 {Cij\ v , n {ui,Uj) - Cij\ v (ui,Uj)} 

n 

- n~ l/2 ^{I(Ui\v,t < u u U j\ VJ < uj) - Cij\ v (ui,Uj)} 

n 

- C\2,(u u Uj)n 112 YjVWiW ^ "0 - «/} 

t=i 

n 

(16) - Cfl{u h uj) n~ 1/2 YjJiUfo < Uj) - Uj) + o p {\). 

t=\ 

The expansion (fT6l ) is suggested by tedious calculations and supported by extensive 
simulations summarized in Section \5M 

Incidentally, the right-hand side of (fT6l) coincides with the expansion for the 
empirical copula process of the unobservable random pairs (Uj\ Vtt , Uj\ Vtt ), for t = 
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1, . . .,n. This empirical copula would arise if the estimated conditional distribu- 
tion functions Fj^ v<n in equation (fTTT) were replaced by the true ones, F^ v , with 
normalized ranks 

1 " 

(17) Uk\ v ,tn = r / , I{Fk\v,n(Xkr\Xvr) ^ F k\v,n(Xkt\X-vt)} 

n + 1 *—l 

r=\ 

(ke{i,j}; t= l,...,n), 

and the empirical copula 

1 " 

(18) Cij\ v ^ n {Ui,Uj) = - y I(Ui\ VJn < Uj, U j\ vM < Uj), 



without hats. By theory going back to Riischendorf dl976h and Istutel dl984h . 
equation (fTBT ) holds when the empirical pair-copula C, ; | v ,« (TT2l is replaced by the 
empirical copula C, ; | lv , (TT8T ). As we are working with the ranks of the variables 
Fk\v,n(Xkt\X vt ) (t = l,,..,n), rather than the values themselves, it is intuitively not 
unreasonable to expect that replacing Pk\ v , n by F fc| v> „ makes no d ifference asymp- 
totical l y. For some recen t references on the empirica l copu la see Fermania net all 
d2004h . lTsukaharal d2005h . lvan der Vaart and Wellnerl d2007h . and lSegersI d2012h . 

The expansion in (fT6l ) implies that the empirical pair-copula is asymptotically 
normal, 



(19) n 1/2 ^ 



d n 

{Cij\ v , n (Ui, Uj) - Cij\ v {Ui,Uj)} -> 7Y(0, 0- jjlv (Ui,Uj)) 



(n — * oo), 



with asymptotic variance equal to 



(20) cr'iui, uj) = Q y1v (l - C ijlv ) + (C™) 2 - «/) + (C L ^,) Z - uj) 



-[2] x2 . 



i;|v 



- C !j|v C 01v(l - «i) - C ij{v (l - Uj) + C™ Cg, (Cyiv - Uillj), 

where the arguments («,-, k/) of C i; | v and its partial derivatives have been suppressed 
in the notation. Replacing Cij\ v and its derivatives by the estimators (fT2l)-([T3l) 
yields a plug-in estimator <x 2 .| v (m,-, uj) for the asymptotic variance. 

3.3 Resampling 

The empirical pair-copula will most naturally be used for interval estimation and 
hypothesis tests. To be able to derive critical values, one needs resampling pro- 
cedures. Here we propose the multiplier bootstrap for the empirical pair-copula 
process in (fTBT) . It resembles the approach for t he ordinary em pirical copula pro 



cess, p rop osed by Ikemillard and Scaillet ( 2009b and studied in Ib richer and Dette 
irOD and lSegersl(l2012h . 
Consider first the bivariate empirical process 



a ij\v,n\ u ii u j) 



^{/(t/j|v,f < Uj, Uj\ Vtt < Uj) - Cij\ v {ui,Uj)), 



t=\ 
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based upon the random variables Uk\ v ,t from (fT5T >. Let be independent 

and identically distributed random variables, independent of the original sample 
X\, . . . ,X„, with mean zero, unit variance, and a finite absolute moment of some 
order larger t han two, for instance from th e standard normal distribution. By 
Lemma A.l in lRemillard and Scaiiletl 12009), the process 

n 

a 'ij\v,n( U '> U f) = H ~ l/2 2j ^ W^'Km ^ U '> U JWJn <Uj)~ Cij\ v<n (Ui, U j)} 



^jJ;t-f;n)I(Ui\v,tn < Ui, Uj\ vM < uj) 



t=i 



is an asymptotically independent distributional copy of a^y,,,. We therefore pro- 
pose 

n 
f=l 

as a bootstrap resample of aij\ v ^ n {ui, uj). In view of equation (fTBT ). we then suggest 
resampling C/y| Vi „(M;, uj) by 

( 21 ) <%,»("*' U l> - fl/|!y><-' ^ ~ ^ly^' <%,« (1 ' 

Repeating the procedure for B independent rows . . . , with be {1, . . . , B}, 
gives B approximately independent distributional copies of aij\ v ^ n (ui, uj), and thus 
of Cy| Vjn (M,, The pointwise sample variance of these B resamples may serve 
as an alternative estimator d-^ v (ui,uj) of (EOT), Two-sided asymptotic confidence 
intervals for Cij\ v (ui, uj) with confidence level 1 - a can be obtained by 

(22) [Q/|v,n - n~ l/2 q, lt i- a /2, C,7|v,« - «~ 1/2 <7n,oy2], 

where q n p is either the bootstrap estimate of the y3-quantile of C/ ; | Vj „(m/, uj), that is, 
the /J-percentile of the bootstrap samples, or <x,-y| v _1 (yS), where cD -1 is the quantile 
function of the standard normal distribution. 



3.4 Simulation studies of the asymptotic distribution 

We have substantiated the conjectured expansion (fTBT ). limiting distribution ( fT9l ), 
with (1201 . and the resampling procedure from Section [331 through simulation. The 
study includes different types of structures, pair-copula models and parameter val- 
ues. In each experiment, we have generated 1,000 samples of size n from the 
model in question, with n ranging from 100 to 100, 000. For each sample, we have 
computed C,yi V) „(K;, uj), as well as the absolute difference between Cy| V(B (M/, uj) and 
the expansion (fTBT ). in a set of chosen points (m,, uj), at given levels of the struc- 
ture. The models of the study are five-dimensional and comprise the drawable and 
regular vines of Figure \T\ and a canonical vine, which is another special case. The 
latter two are Gaussian, and the former either Gaussian, Student's t or Gumbel. 
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(0-1,0-3) 


(0-4,0-2) 


(0-7,0-8) 






2 


0-54 


0-48 


0-67 




Gaussian 


3 


0-40 


0-70 


0-69 


Drawable 




4 


0-26 


0-44 


0-36 




Student's t 


2 


0-47 


0-65 


0-51 




Gumbel 


2 


0-37 


0-91 


0-45 


Canonical 


Gaussian 


3 


0-37 


0-45 


0-44 


Regular 


Gaussian 


4 


0-35 


0-61 


0-41 



Table 1: P-values from the Kolmogorov-Smirnov goodness-of-fit tests on the sim- 
ulated processes Q ; -| v> „(w;, uj) with n = 1,000, in the points (0-1,0-3), (0-4,0-2) and 
(0-7,0-8), for different vine structures, copula models and levels 

The parameters of all the Gumbel copulae are 6 = 1-5. Further, the Gaussian and 
Student's t correlations are p, p/(l + p), p/(l + 2p) and p/(l + 3p) at the first, sec- 
ond, third and fourth level, respectively, with p = 0-2, 0-5, 0-8. The corresponding 
degrees of freedom of the latter are v, v + 1, v + 2 and v + 3, with v = 6. 

According to (fT6l ). the absolute difference between the left and right hand sides 
should decrease and eventually vanish as n increases. The top row of Figure [2] 
shows the mean of these differences in the point (0-3,0-7), over the simulations 
from the Gaussian drawable vine with p = 0-5, for growing n, on log-log scale. 
Indeed, these decrease, though rather slowly. The rate of convergence appears to 
be approximately of the same order as for the ordinary empirical copula process, 
namely n~ x ^ , which is to be expected. 

Furthermore, we have tested the limiting distribution (fT9l of Cjj| v>n (w i -, uj) with 
variance (T20l . using the Kolmogorov-Smirnov goodness-of-fit test. Tabled] shows 
the corresponding p-values in the three points (0-1,0-3), (0-4, 0-2) and (0-7, 0-8) for 
a selection of models and levels, with p = 0-5 and n = 1,000. The consistently 
high p-values indicate that the assumed distribution fits the samples well. This is 
confirmed by normal QQ-plots and histograms of the samples, superposed by the 
asymptotic probability density functions. These are displayed in the lower two 
rows of Figure |2] for the Gaussian drawable vine with n = 1,000. Examples with 
other structures and copulae may be found in the supplement. 

As seen from (fTOl and (fT4l . the estimators depend on a bandwidth parameter 
h n . This bandwidth should obviously be proportional to some power of rT l , at least 
n~ 1 ^ 2 to guarantee consistent estimators. Viewing C\) n (ui, uj) and C^ vn (uj,uj) as 
predictions of the conditional expectations E(/(t/,- < uj)\Uj = Uj)) and E(/(t/,-| v < 
ud\Uj\ v = uj)), respectively, one could construct a cross-validation procedure for 
bandwidth selection. However, this is non-trivial since the observations are in- 
dicator functions, the choice of points (u t , uj) is not obvious and there are up to 
(d - l)(d - 2) predictors to evaluate. 
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In order to investigate the influence of the bandwidth on the estimators, we 
have repeated the simulation of the Gaussian drawable vine with each of the three 
bandwidths h n = n~s, n~* and 05ra~3. The first of these is proportional to Silver- 
man's rule. The other two are undersmoothing alternatives. As mentioned earlier, 
the optimal choice of bandwidth does not depend on the margins, but only on the 
dependence structure. For p = 0-2, the estimators behave well for all three band- 
widths, but for higher dependencies, i.e. p = 0-5 and 0-8, h n = 0-5n~i is without 
a doubt the only sensible choice. We have therefore used that bandwidth through- 
out the paper. Intuitively, it makes sense to undersmooth the estimators, i.e. to 
minimize the bias at the expense of the variance. The pointwise estimates of the 
conditional distributions may then differ considerably from the true values, but the 
copula estimator will average out these discrepancies. 

It remains to test the proposed resampling procedure. We have simulated from 
the Gaussian, Student's t and Gumbel drawable vines from before, in dimension 
d = 4, with p = 0-5 and n = 1, 000. For each sample, we have generated B = 1, 000 
multiplier bootstrap estimates of the confidence interval (1221 of the top level copula 
Cu\23, evaluated in (04,0-2), with a = {0T, 0-05, 0-01}, using both approaches, as 
well as plug-in estimates, based on o"^,- 

As suggested earlier, parametric estimators are more efficient when the model 
is correctly specified, or at least close to the truth. However, we believe that the 
empirical estimator is more robust. Therefore, we have computed correspond- 
ing percentile confidence intervals based on parametric bootstrap, assuming both 
correct and incorrect copula families in the lower two levels, but always the true 
family for the copula of interest. More specifically, we estimated the intervals for 
the Gaussian model, assuming first the true model, and then Gumbel copulae in the 
first two levels. We repeated this for the Student's t model with Gumbel copulae, 
and for the Gumbel model with Gaussian copulae. Th e estimator w e have used is 
the stepwise semiparametric estimator; see for instance iHobsek Ham (120121) . 

Table [2] shows the confidence intervals' average length and actual coverage, 
i.e. the fraction of intervals that contain the true value of Ci2|34(0-4, 0-2). The 
ones based on the multiplier bootstrap percentiles are shorter than the symmetric 
ones. Further, the plug-in estimator o"j 4 i 23 is surprisingly good. Of course, all these 
intervals are longer than the ones obtained with the parametric estimator. More- 
over, their actual coverage is consistently higher than the nominal one. However, 
the misspecified models produce intervals with substantially lower coverage than 
the chosen confidence levels. Hence, tests based on the empirical estimator are 
expected to be conservative, and thus less powerful than parametric equivalents, 
but on the other hand more robust towards misspecifications in lower levels. An- 
other advantage of the multiplier bootstrap scheme is that it is much faster than the 
parametric one, especially for the Student's t model. Also note that for this partic- 
ular model, the parametric intervals made under the true model assumptions have 
smaller coverage than the nominal one, which probably means that B = 1,000 is 
insufficient in this case. Naturally, the misspecifications in the above experiments 
are not very realistic, but merely meant as an illustration of how errors propagate 
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Model 



Gaussian 



Student's t 



Gumbel 



a 


Non-parametric 


Parametric 






Percentile 


Symmetric 


Plug-in 


Correct 


Incorrect 


0-1 


0-94 (2-1) 


0-94 (2-1) 


0-94 (2-1) 


0-90 (1-1) 


0-86 (1 


1) 


IV IV .J 


0-97 (2-5) 


0-98 (2-5) 


0-98 (2-5) 


0-95 (1-3) 


0-92 (1 


3) 


001 


0-99 (3-2) 


0-99 (3-2) 


0-99 (3-3) 


0-99 (1-7) 


0-97 (1 


7) 


0-1 


0-91 (2-1) 


0-91 (2-1) 


0-91 (2-1) 


0-89 (1-2) 


0-00 (1 


1) 


0-05 


0-95 (2-5) 


0-95 (2-5) 


0-95 (2-5) 


0-94 (1-4) 


0-00 (1 


4) 


001 


0-99 (3-2) 


0-99 (3-3) 


0-99 (3-3) 


0-98 (1-9) 


0-00 (1 


8) 


0-1 


0-91 (2-0) 


0-91 (2-0) 


0-91 (2-0) 


0-91 (1-0) 


0-68 (1 


0) 


0-05 


0-96 (24) 


0-96 (2-4) 


0-96 (2-4) 


0-96 (1-2) 


0-79 (1 


2) 


0-01 


0-99 (3-1) 


0-99 (3-1) 


0-99 (3-1) 


0-99 (1-6) 


0-91 (1 


5) 



Table 2: Coverage of the estimated confidence intervals for Ci4|23(0-4, 0-2) in the 
Gaussian, Student's t and Gumbel models with n = 1, 000. The average lengths, in 
parentheses, are multiplied by 10 2 . 



from level to level. In practice, one should be able to choose reasonably well at 
least at the first level. 

We repeated the above simulations for the Gaussian model with B = 500 and 
with n = 10,000. The results were as expected. When n increases, the interval 
lengths obviously decrease, whereas the actual coverage becomes more varying 
for smaller B. We therefore use n = B = 1, 000 in the remaining sections. 



4 Methods derived from the empirical pair-copula 

4.1 Estimating the conditional Spearman correlation 

The Spearman correlation p$ of the bivariate copula C of a random pair (U, V) with 
uniform (0, 1) margins can be expressed as 

p s (C) = 12 \ C(u,v)d(u,v)-3 = cor(U,V). 
J[o,i] 2 

Similarly, ps(Cij\ v ) is a measure of association between X,- and Xj, condition- 
ally on X v . This quantity can be estimated by the plug-in estimator ps(Cij\ VJI ) = 
12 j Cij\ Vi „ - 3, which is approximately equal to the sample correlation of the pairs 
(Ui\ v jn, V j\v,m), for f = 1, . . , ,n. The expansion of the empirical pair-copula process 
in (PT6l ) implies that 

(23) n U2 {p s (Cij\ v , n ) -p s (Cjj\ v )} -i 12 I Cij\ v (ui,Uj)d(ui,Uj), 

J[o,i] 2 

where Q ; | v , the conjectured large-sample limit of Cij\ v>n , is a centred Gaussian pro- 
cess on [0, l] 2 with covariance function determined by the right-hand side of (fT6l) . 
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The limiting random variable in (1231 is a centred normal random variable with 
variance 

cr 2 = 144 \ \ co\{Cij\ v {uuUj),Cij\ v {u\,u\)}d{ui,Uj)d{u[,u). 

J[(U] 2 J[0,1] 2 

This variance can be estimated either by a plug-in estimator or via the multiplier 
resampling scheme described in Section 13.31 The latter procedure consists in re- 
sampling C/jiv.n and integrating it either by numerical or Monte Carlo integration 
over [0, l] 2 . One may then estimate cr 2 by the sample variance of the B resamples 
of 12 fQj| v . Further, confidence intervals for ps(Cij\ v ,„) can be constructed either 
via the normal approximation with estimated variance cr 2 or by using resample 
percentiles. 

In order to verify (|23"1) . we have simulated from the same four-dimensional 
models as in the last part of Section 13.41 computing (1231 for each of the 1 , 000 
samples. The p-values from the Kolmogorov-Smirnov tests are 0-89, 0-92 and 
0-68, respectively, for the three models, which clearly agrees with the conjecture. 
Normal QQ-plots and histograms are shown in the supplement. 

Moreover, we have tested the suggested resampling scheme for 12 J C 1423 in 
the same way as in Section [3~4l The corresponding results, shown in the supple- 
ment, are very similar. The confidence intervals based on the empirical estimator 
are longer and have larger actual coverage than the parametric equivalents, whereas 
the latter are non-robust towards misspecifications in lower levels. Once more, the 
intervals based on the multiplier bootstrap percentiles appear to be the best of the 
empirical ones. The plug-in estimator of the variance cr 2 is also rather good, but 
computationally much slower than the multiplier bootstrap. 



4.2 Vine structure selection 

Selecting the structure of a pair-copula construction consists in choosing which 
variables to associate with a pair-copula at each level. As the model uncertainty 
increases with the level, the state of the art is to try to c apture as much o f the 
dependence as possible in the lower levels of the structure. lAas et al. (2009) pro- 



pose ordering the variables of a drawabl e vine in the way that maximizes the tail 



dependencies at the ground level, while [Dissmann et all (120111) suggest a model 



selection algorithm for more general regular vines, that maximizes the sum of ab- 
solute values of Kendall's t coefficients at each level. Both these schemes require 
the simultaneous choice and estimation of parametric copulae. At the ground level, 
the latter algorithm only uses the sample Kendall's rs, and therefore does not call 
for assumptions about the pair-copulae. However, from the second level on, the 
ts involve the unobserved variables Fj| v (X,|X v ), that are estimated parametrically 
from the copulae in the previous level via ©. Inadequate choices of copulae may 
thus influence the structure selection at the higher levels. 

We propose a more robust model selection scheme, based on our nonparametric 
estimate of the Spearman correlation ps . 
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1. Compute the ground level normalized ranks J7 !tn (f = 1, . . . , n). 

2. Compute p$ (Cy >B ) for all pairs {/, j} c { 1, . . . , d) such that i < j. 

3. Select the spanning tree T\ on {1, . . . ,d) that maximizes \ps(Cij,n)\- 

4. Estimate Uj\j t tn an d Uj\i,tn f° r all selected pairs {/, j], using (fTOl ). 

5. For levels f = 2, 1: 

(a) Compute ps(Q/| v ,n) f° r an possible pairs {/, j}. 

(b) Select the spanning tree T( that maximizes 2{!,/}e7> IPs(Q/|v,n)l- 

(c) Estimate Ui\pv,tn an d U j\iuv,tn for all selected pairs {/, 7'}, using (fT4l) . 

The above algorithm is strongly inspired by iDissmann et all (120111) . who also 
explain the concept of possible pairs. We merely estimate the copulae and con- 
ditional distributions nonparametrically rather than parametrically and use Spear- 
man's ps instead of Kendall's t. The substitution of dependence measures should 
not influence the results too much. When the model is well specified, one would 
therefore expect the two algorithms to select virtually the same structure. 

The algorithm is put into practice in Section 15.11 where it is found to impose 
quite a reasonable structure on a set of financial variables. 



4.3 Testing for conditional independence 

The number of parameters in a pair-copula construction grows rapidly with increas- 
ing dimension d. Identifying independence copulae in the structure is one way of 
reducing this number. One may therefore add tests for conditional independence 
as a step in the model selection algorithm of Section l4~2l 

In case C, ; | v is the independence copula, equation (fTBT ) implies that the asymp- 
totic distribution of Q/| v>n is the same as the one of the bivariate empirical cop- 
ula under independence. In other words, the random vectors (Uq Vttn , Uj\ v ,tn) for 
t e {1, . . . ,n) behave in distribution as the sample of bivariate normalized ranks 
from a random sample of a bivariate distribution with independent components. 
Therefore, rank-based tests for independence can be applied without adjustment of 
the critical values. 

We propose to test the null hypothesis of conditional independence of X, and 
Xj, given X v , by the Cramer-von Mises test statistic 

1 - 

C tj\v,n( u i> «y)dC i7M («,-, Uj) = ~Y_ l C lj\v,n(UiW,tn, Uj\v,tn), 

(=1 

where C/ 7 | Vj „(mj, uj) = n l ^ 2 {Cij\ v ^ n (ui, Uj) - UjUj} is the empirical pair-copula process 
under the null hypothesis of conditional independence, that is, C\j\ v {uu uj) - UjUj 
for all (ui,uj) € [0, l] 2 . Under the null hypothesis, the limit distribution of the 
test statistic is distribution free and is given by 1]2 C 2 (u, v) d(u, v), where C is the 
limiting empirical copula process under independence. Critical values of the test 
statistic can be obtained by Monte Carlo estimation based on random samples from 
a distribution with independent components. 



f 

Jro 
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Once more, we have compared our test with parametric equivalents, based on 
parametric bootstrap, on the four-dimensional models from Section 13.41 but with 
the top level copula Cu\23(u\, U4) = u\U/\. Table[3]shows the rejection rates at levels 
a - {0-1, 05, 01}. Again, the tests based on the empirical estimator appear to 
be conservative, that is, the rejection rates are consistently lower than the specified 
levels. As anticipated, the parametric tests are more powerful under correct model 
assumptions, but the rejection rates are slightly too high for the Student's t model, 
which seems to require a higher B. Moreover, the rejection rates are too high under 
incorrect model assumptions, which demonstrates these tests' lack of robustness. 



4.4 Goodness-of-fit testing 

In the parametric case, model selection consists in choosing not only the struc- 
ture, as described in Section |4~T1 but also the families of the d(d - l)/2 copulae. 
Goodness-of-fit tests can help to assess whether the selected model represents the 
dependence structure well. At the gro und level, one may s imply apply the standard 



tests, for instance the ones studied in iGenest et al.1 ((2009). From the second level 



on, it becomes more complicated, since the copula arguments are themselves un- 
known conditional distributions, derived from a cascade of pair-copulae at lower 
levels. 

Following the reasoning of Section [4731 we propos e a Cramer- von Mises good- 
ness-of-fit test, more specifically, the test proposed by Ge nest and Remillard (2008), 



replacing the normalized ranks by our non-parametric estimators of the conditional 
distributions. Critical values may then be obtained by the bootstrap procedure they 
describe, again substituting the normalized ranks by our estimators. 

Testing this procedure on the top level copula of the four-dimensional Gumbel 
model from Section [3741 with n = 1,000, we obtained rejection rates of 0-098, 
0-042 and 0-0044 for the null hypothesis that it is a Gumbel copula at levels 0-1, 
0-05 and 0-01, respectively. For the hypotheses that it is Student's t and Gaussian, 
the corresponding rates were 0-90, 0-83, 0-60 and 0-91, 0-84, 0-62. Hence the 
former are clearly rejected, while the true model, Gumbel, is not, as it should be. 



5 Data examples 
5.1 Financial data set 

The financial data set consists of nine Norwegian and international daily price se- 
ries from March 25th, 2003, to March 26th, 2006, which corresponds to 1107 
observations. These include the Norwegian 5- and 6-year Swap Rates (NI5 and 
NI6), the 5-year German Government Rate (GI5), the BRIX Norwegian Bond In- 
dex (NB) and ST2X Government Bond Index (MM), the WGBI Citigroup World 
Government Bond Index (IB), the OSEBX Oslo Stock Exchange Main Index (NS), 
the MSCI Morgan Stanley World Index (IS) and the Standard & Poor Hedge Fund 



Index (HF). This is a subset of the 19 variables, analyzed in iBrechmann et al. 
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Model 




Non-parametric 


Parametric 




a 




Correct Incorrect 




01 


0-099 


0-10 0-22 


Gaussian 


0-05 


0-049 


0-050 0-14 




0-01 


0-0096 


0-010 0-034 



0-1 


0-097 


0-11 


016 


Student's t 0-05 


0-047 


0-068 


0-082 


001 


0-0093 


0-022 


0-042 



0-1 0-094 0-099 0-23 

Gumbel 0-05 0-044 0-048 0-14 

0-01 0-0088 0-0090 0-048 

Table 3: Rejection rates from the Cramer-von Mises tests for conditional inde- 
pendence at the third level of the Gaussian, Student's t and Gumbel models with 
n = 1,000 



120121) . which represent the market portfolio of one of the largest Norwegian fi- 
nancial institutions. We have followed their example, and filtrated each of the 
series with an appropriate time series model to remove the temporal dependence. 
Subsequently, we have modelled the standardized residuals with a regular vine. 

We selected the vine str ucture, first with t he me thod proposed in Section 14.2 
and then with the method of iDissmann et all (120111) . The two selected structures 
were actually identical, which is reassuring. The dependence in the ground level 
appears to be very strong, with Spearman rank correlations that are large in absolute 
value. In the remaining levels, the Spearman correlations are considerably smaller, 
and only 9 out of 28 copulae were significantly different from the independence 
copula at level 0-05, according to the test from Section 14.31 Hence, most of the 
dependence has been captured in the ground level, shown in Figure [3l which was 
the aim. 

The collection of pairs selected by the algorithm at this level is quite reason- 
able. The three stock indices and the three interest rates are grouped together, 
whereas the Norwegian bond indices are dependent on the international bond in- 
dex via the interest rates. 



5.2 Precipitation data set 

The precipitation set is composed of daily recordings from January 1st, 1990, to 
December 31st, 2006, at five different meteorological stations in No rway: Vesfby, 
Ski, L 0rens kog, Nannest a d and Hurdal. This data set was used in both lBerg and Aas 
(120090 and iHobsek Haffl (120121) . As in those papers, we have modelled only the 
positive precipitation, discarding all observations for which at least one of the sta- 
tions has recorded zero precipitation. The remaining 2013 observations appear to 
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be fairly independent in time. We model these with a drawable vine, ordering the 
stations according to geography. The model is quite natural since the stations are 
located almost on a straight line, from Vestby in the South to Hurdal in the North; 
see the m ap in the suppleme nt. The parametric model used for comparison is the 
one from lHobsek Haffl (120121) . with Gumbel copulae at the ground level and subse- 
quently Gaussian ones. 

Since rain showers tend to be rather local, one would expect the dependence 
to be strongest between the closest stations, and decrease with the level, possibly 
even down to conditional independence. Therefore we have tested the second, 
third and fourth level copulae for conditional independence, both with the non- 
parametric test from Section 14.31 and equivalent parametric tests. The Spearman 
rank correlations at the ground level range from 0-82 to 0-94, indicating a strong 
positive dependence. At the second level, they are considerably lower, but the 
hypothesis of conditional independence is rejected for all copulae, by both tests, 
which actually agree in the last two levels as well. The conditional copula of the 
measurements from Vestby and Nannestad, given the two stations in between, is 
also significantly different from independence. This is not true for Ski and Hurdal, 
conditioning on L0renskog and Nannestad, and neither for the top level copula, 
linking Vestby and Hurdal, conditionally on the three stations in between. 
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Figure 2: Results for the Gaussian drawable vine with p = 0-5 at levels 2, 3 and 4 
(in columns 1, 2 and 3, respectively). The first row shows the means of the samples 
of the absolute difference between Qj| lv ,(0-3, 0-7) and expansion (fT6l) . for n from 
10 2 to 10 5 , on log-log scale (the original values are on the axes). The last two rows 
display normal QQ-plots and histograms of the samples of C,j| Vj „(04, 0-2) with 
n = 1,000, respectively, the latter superposed by the limiting probability density 
functions from (fT9l ). 




Figure 3: Ground level of the regular vine selected for the financial data in Sec- 
tion 15. ii The thickness of the edges is determined by the absolute value of the 
corresponding Spearman rank correlations. 
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A Supplement: Extra figures and tables 




A 



-1.0 -0.5 0.0 0.5 1.0 



-1.0 -0.5 0.0 0.5 



Figure 4: Normal QQ-plots and histograms of the samples of Cjj| V) „(04, 0-2) at 
levels 3 and 4 of the Gaussian canonical and regular vines (column 1 and 2), re- 
spectively, with p = 0-5 and n - 1 , 000, along with the corresponding conjectured 
limiting probability density functions 
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Figure 5: Normal QQ-plots and histograms of the samples of Ci3|2, ra (04, 0-2) from 
the Student's t and a Gumbel drawable vines (column 1 and 2) with n = 1,000 
and (p = 0-5, v = 6) and a = 1-5, respectively, along with the corresponding 
conjectured limiting probability density functions 



Model 


a 




Non-parametric 


Parametric 






Percentile 


Symmetric 


Plug-in 


Correct 


Incorrect 




0-1 


0-91 (0-99) 


0-91 (0-99) 


0-92 (1-0) 


0-90 (0-94) 


0-86 (0-93) 


Gaussian 


0-05 


0-95 (1-2) 


0-95 (1-2) 


0-95 (1-2) 


0-95 (1-1) 


0-92 (1-1) 




0-01 


0-99 (1-5) 


0-99 (1-6) 


0-99 (1-6) 


0-99 (1-5) 


0-97 (1-5) 




0-1 


0-90 (1-0) 


0-90 (1-0) 


0-91 (1-1) 


0-89 (1-0) 


0-00 (0-99) 


Student's t 


0-05 


0-95 (1-2) 


0-95 (1-2) 


0-95 (1-3) 


0-93 (1-2) 


0-00 (1-2) 




0-01 


0-99 (1-6) 


0-99 (1-6) 


0-99 (1-7) 


0-97 (1-6) 


0-00 (1-5) 




0-1 


0-92 (0-88) 


0-92 (0-88) 


0-92 (0-97) 


0-91 (0-83) 


0-68 (0-80) 


Gumbel 


0-05 


0-96 (1-0) 


0-96 (1-1) 


0-96 (1-2) 


0-95 (0-98) 


0-79 (0-95) 




0-01 


0-99 (14) 


0-99 (14) 


0-99 (1-5) 


0-99 (1-3) 


0-91 (1-2) 



Table 4: Coverage and length (upper and lower value, respectively) of the estimated 
confidence intervals for p(Cu\23) in the Gaussian, Student's t and Gumbel models 
with n - 1, 000. The lengths are multiplied by 10. 
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-3 3 -3 3 -3 3 




-3-113 -3-113 -3-113 

Figure 6: Normal QQ-plots and histograms of the samples of 12 J" C 1423 from the 
Gaussian, Student's t and a Gumbel models with n - 1,000 and p - 0-5, (p - 
0-5, v = 6) and a - 1-5, respectively, along with the corresponding conjectured 
limiting probability density functions 
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Figure 7: Meteorological stations where the precipitations in Section 5-2 were 
recorded 
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